The Robust-Algorithm Approach to Fault Tolerance on Processor Arrays: Fault Models, Fault Diameter, and Basic Algorithms
نویسندگان
چکیده
With few exceptions, the two issues of algorithm design and fault tolerance for processor arrays have been dealt with separately, in that algorithm developers have assumed the availability of complete fault-free arrays and fault tolerance techniques have aimed at restoring such complete arrays by reconfiguring faulty ones. We present the design of robust algorithms that run efficiently on complete arrays but are also tolerant of faulty processors/links in a degraded mode. This is a complementary approach in that our algorithms can be used on reconfigurable arrays that tolerate a certain number of faults while maintaining their regularity, with the graceful degradation feature kicking in once the fault tolerance limit of the reconfiguration scheme is exceeded. The fault models considered in this paper comprise of the faulty processors/links being removed from the pool of resources (removal model) or bypassed in their respective rows/columns (bypass model). We discuss the two models, derive tight upper bounds for the fault diameter of the resulting networks, and present building-block algorithms for semigroup computation, parallel prefix computation, data rearrangement, matrix multiplication, and sorting.
منابع مشابه
Reliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)
Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کاملFDMG: Fault detection method by using genetic algorithm in clustered wireless sensor networks
Wireless sensor networks (WSNs) consist of a large number of sensor nodes which are capable of sensing different environmental phenomena and sending the collected data to the base station or Sink. Since sensor nodes are made of cheap components and are deployed in remote and uncontrolled environments, they are prone to failure; thus, maintaining a network with its proper functions even when und...
متن کاملIdentification and Robust Fault Detection of Industrial Gas Turbine Prototype Using LLNF Model
In this study, detection and identification of common faults in industrial gas turbines is investigated. We propose a model-based robust fault detection(FD) method based on multiple models. For residual generation a bank of Local Linear Neuro-Fuzzy (LLNF) models is used. Moreover, in fault detection step, a passive approach based on adaptive threshold is employed. To achieve this purpose, the a...
متن کاملNovel efficient fault-tolerant full-adder for quantum-dot cellular automata
Quantum-dot cellular automata (QCA) are an emerging technology and a possible alternative for semiconductor transistor based technologies. A novel fault-tolerant QCA full-adder cell is proposed: This component is simple in structure and suitable for designing fault-tolerant QCA circuits. The redundant version of QCA full-adder cell is powerful in terms of implementing robust digital functions. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998